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Abstract 

This  paper  considers  for  the  first  time  end-to-end  response¬ 
time  analysis  for  DAG-based  real-time  task  systems  im¬ 
plemented  on  heterogeneous  multicore  platforms.  The  spe¬ 
cific  analysis  problem  that  is  considered  was  motivated  by 
an  industrial  collaboration  involving  wireless  cellular  base 
stations.  The  DAG-based  systems  considered  herein  allow 
intra-task  parallelism:  while  each  invocation  of  a  task  (i.e., 
DAG  node )  is  sequential,  successive  invocations  of  a  task 
may  execute  in  parallel.  In  the  proposed  analysis,  this  char¬ 
acteristic  is  exploited  to  reduce  response-time  bounds.  Ad¬ 
ditionally,  there  is  some  leeway  in  choosing  how  to  set  tasks  ’ 
relative  deadlines.  It  is  shown  that  by  resolving  such  choices 
holistically  via  linear  programming,  response-time  bounds 
can  be  further  reduced.  Finally,  in  the  considered  use  case, 
DAGs  are  defined  based  upon  just  a  few  templates  and  indi¬ 
vidually  often  have  quite  low  utilizations.  It  is  shown  that,  by 
combining  many  such  DAGs  into  one  of  higher  utilization, 
response-time  bounds  can  often  be  drastically  lowered.  The 
effectiveness  of  these  techniques  is  demonstrated  via  both 
case-study  and  schedulability  experiments. 

1  Introduction 

The  multicore  revolution  is  currently  undergoing  a  second 
wave  of  innovation  in  the  form  of  heterogeneous  hardware 
platforms.  In  the  domain  of  real-time  embedded  systems, 
such  platforms  may  be  desirable  to  use  for  a  variety  of  rea¬ 
sons.  For  example,  ARM’s  big. LITTLE  multicore  architec¬ 
ture  [7]  enables  performance  and  energy  concerns  to  be  bal¬ 
anced  by  providing  a  mix  of  relatively  slower,  low-power 
cores  and  faster,  high-power  ones.  Unfortunately,  the  move 
towards  greater  heterogeneity  is  further  complicating  soft¬ 
ware  design  processes  that  were  already  being  challenged 
on  account  of  the  significant  parallelism  that  exists  in  “con¬ 
ventional”  multicore  platforms  with  identical  processors. 
Such  complications  are  impeding  advancements  in  the  em¬ 
bedded  computing  industry  today. 

Problem  considered  herein.  In  this  paper,  we  report  on 
our  efforts  towards  solving  a  particular  real-time  analysis 
problem  concerning  heterogeneity  motivated  by  an  indus¬ 
trial  collaboration.  This  problem  pertains  to  the  processing 
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done  by  cellular  base  stations  in  wireless  networks.  Due  to 
space  constraints,  we  refrain  from  delving  into  specifics  re¬ 
garding  this  particular  application  domain,  opting  instead 
for  a  more  abstract  treatment  of  the  problem  at  hand. 

This  problem  involves  the  scheduling  of  real-time  data¬ 
flows  on  heterogeneous  computational  elements  (CEs),  such 
as  CPUs,  digital  signal  processors  (DSPs),  or  one  of  many 
types  of  hardware  accelerators.  Each  dataflow  is  represented 
by  a  DAG,  the  nodes  (resp.,  edges)  of  which  represent  tasks 
(resp.,  producer/consumer  relationships).  A  given  task  is  re¬ 
stricted  to  run  on  a  specific  CE  type.  Task  preemption  may 
be  impossible  for  some  CEs  and  should  in  any  case  be  dis¬ 
couraged.  Each  DAG  has  a  single  source  task  that  is  in¬ 
voked  periodically.  Intra-task  parallelism  is  allowed  in  the 
sense  that  consecutive  jobs  {i.e.,  invocations)  of  the  same 
task  can  execute  in  parallel  (but  each  job  executes  sequen¬ 
tially).  In  fact,  a  later  job  can  finish  earlier  due  to  variations 
in  running  times.1  The  DAGs  to  be  supported  are  defined 
using  a  relatively  small  number  of  “templates,”  i.e.,  many 
DAGs  may  exist  that  are  structurally  identical.  The  chal¬ 
lenge  is  to  devise  a  multi-resource,  real-time  scheduler  for 
supporting  dataflows  as  described  here  with  accompanying 
per-dataflow  end-to-end  response-time  analysis.  Note  that, 
although  response-time  bounds  are  required  (and  should  not 
be  too  large),  strict  deadlines  are  not  enforced. 

Related  work.  The  literature  on  real-time  systems  includes 
much  work  pertaining  to  the  scheduling  of  DAG-based  task 
systems  on  identical  multiprocessor  platforms  [1,  3,  4,  5,  6, 
8,  9,  11,  13,  15,  16,  17,  18,  19,  20,  21,  22],  However,  we  are 
not  aware  of  any  corresponding  work  directed  at  heteroge¬ 
neous  platforms.  Moreover,  the  problem  above  has  facets — 
such  as  allowing  intra-task  parallelism  and  defining  poten¬ 
tially  many  DAGs  based  upon  relatively  few  templates — 
that  have  not  been  fully  explored.  (Intra-task  parallelism 
has  been  considered  in  a  limited  context  in  DAG-related 
work  pertaining  to  identical  multiprocessors  [6,  8,  21].)  Ad¬ 
ditionally,  in  prior  work  on  the  real-time  scheduling  of  task 
systems — which  are  not  DAG-based — upon  heterogeneous 
platforms,  task-to-CE  assignments  have  been  a  paramount 
concern.  In  our  setting,  this  is  a  non-issue,  as  this  assign¬ 
ment  is  pre-determined  based  on  CE  functionalities. 

Beyond  the  real-time  systems  community,  DAG-based 
systems  implemented  on  heterogeneous  platforms  have 

1  In  the  considered  application  domain,  the  data  produced  by  subsequent 
jobs  can  be  buffered  until  all  prior  jobs  of  the  same  task  have  completed. 
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been  considered  before  ( e.g .,  [2,  14,  24]).  However,  all  such 
work  known  to  us  focuses  on  one-shot,  aperiodic  DAG- 
based  jobs,  rather  than  periodic  or  sporadic  DAG-based  task 
systems.  Moreover,  real-time  issues  are  considered  only 
obliquely  from  the  perspectives  of  job  admission  control  or 
job  makespan  minimization. 

Contributions.  In  this  paper,  we  formalize  the  problem  de¬ 
scribed  above  and  then  address  it  by  proposing  a  scheduling 
approach  and  associated  end-to-end  response-time  analysis. 
In  the  first  part  of  the  paper,  we  attack  the  problem  by  pre¬ 
senting  a  transformation  process  whereby  successive  task 
models  are  introduced  such  that:  (i)  the  first  task  model 
directly  formalizes  the  problem  above;  (ii)  prior  analysis 
can  be  applied  to  the  last  model  to  obtain  response-time 
bounds  under  earliest-deadline-first  (EDF)  scheduling;  and 
(iii)  each  successive  model  is  a  refinement  of  the  prior  one 
in  the  sense  that  all  DAG-based  precedence  constraints  are 
preserved.  Such  a  transformation  approach  was  previously 
used  by  Liu  and  Anderson  [18]  in  work  on  DAG-based  sys¬ 
tems,  but  that  work  focused  on  identical  multiprocessors. 
Moreover,  our  work  differs  from  theirs  in  that  we  allow 
intra-task  parallelism.  This  enables  much  smaller  end-to- 
end  response-time  bounds  to  be  derived. 

After  presenting  this  transformation  process,  we  discuss 
two  techniques  that  can  reduce  the  response-time  bounds 
enabled  by  this  process.  The  first  technique  exploits  the  fact 
that  some  leeway  exists  in  setting  tasks’  relative  deadlines. 
By  setting  more  aggressive  deadlines  for  tasks  along  “long” 
paths  in  a  DAG,  the  overall  end-to-end  response-time  bound 
of  that  DAG  can  be  reduced.  We  show  that  such  deadline 
adjustments  can  be  made  by  solving  a  linear  program. 

The  second  technique  exploits  the  fact  that,  in  the  con¬ 
sidered  context,  DAGs  are  defined  using  relatively  few  tem¬ 
plates  and  typically  have  quite  low  utilizations.  These  facts 
enable  us  to  reduce  response-time  bounds  by  combining 
many  DAGs  into  one  of  larger  utilization.  As  a  very  simple 
example,  two  DAGs  with  a  period  of  10  time  units  might  be 
combined  into  one  with  a  period  of  5  time  units.  A  response- 
time-bound  reduction  is  enabled  because  these  bounds  tend 
to  be  proportional  to  periods.  In  the  considered  application 
domain,  the  extent  of  combining  can  be  much  more  exten¬ 
sive:  upwards  of  40  DAGs  may  be  combinable. 

As  a  final  contribution,  we  evaluate  our  proposed 
techniques  via  case-study  and  schedulability  experiments. 
These  experiments  show  that  our  techniques  can  signifi¬ 
cantly  reduce  response-time  bounds.  Furthermore,  our  anal¬ 
ysis  supports  “early  releasing”  [10]  (see  Sec.  7)  to  im¬ 
prove  observed  end-to-end  response  times.  We  experimen¬ 
tally  demonstrate  the  efficacy  of  this  as  well. 

Organization.  In  the  rest  of  this  paper,  we  formalize  the 
considered  problem  (Sec.  2),  present  the  refinements  men¬ 
tioned  above  that  enable  the  use  of  prior  analysis  (Secs.  3- 
4),  show  that  the  bounds  arising  from  this  analysis  can  be 
improved  via  linear  programming  (Sec.  5)  and  DAG  com¬ 
bining  (Sec.  6),  discuss  early  releasing  (Sec.  7),  present  our 
case-study  (Sec.  8)  and  schedulability  (Sec.  9)  experiments, 
and  conclude  (Sec.  10). 


2  System  Model 

In  this  section,  we  formalize  the  dataflow-scheduling  prob¬ 
lem  described  in  Sec.  1  and  introduce  relevant  terminology. 
Each  dataflow  is  represented  by  a  DAG,  as  discussed  earlier. 

We  specifically  consider  a  system  G  =  {Gi,G2, 

. . . ,  G,v }  comprised  of  N  DAGs.  The  DAG  G,  con¬ 
sists  of  rii  nodes,  which  correspond  to  n,  tasks,  de¬ 
noted  rj,  t£,  . . . ,  t"\  Each  task  r”  releases  a  (po¬ 
tentially  infinite)  sequence  of  jobs  Jjj ,  Jj.2 , . . . .  The 
edges  in  Gj  reflect  producer/consumer  relationships.  A 
particular  task  rj” s  producers  are  those  tasks  with 
outgoing  edges  directed  to  rf,  and  its  consumers 
are  those  with  incoming  edges  directed  from  t"  . 
The  jlh  job  of  task  t",  «7?L,  can¬ 
not  commence  execution  until 
the  jih  jobs  of  all  of  its  produc¬ 
ers  have  completed;  this  ensures 
that  its  necessary  input  data  is 
available.  Such  job  dependen¬ 
cies  only  exist  with  respect  to 
the  same  invocation  of  a  DAG, 
and  not  across  different  invoca¬ 
tions.  That  is,  while  jobs  must 
execute  sequentially,  intra-task 
parallelism  is  allowed. 

Example  1.  Fig.  1  shows  an  example  DAG,  Gi.  Task  rf’s 
producers  are  tasks  rf  and  rf,  thus  for  any  j,  JjG  needs 
input  data  from  each  of  ./'j  .  and  Jf  • ,  so  it  must  wait  until 
those  jobs  complete.  Because  intra-task  parallelism  is  al¬ 
lowed,  Jf  ■  and  Jf  j+ 1  could  potentially  execute  in  parallel. 

To  simplify  analysis,  we  assume  that  each  DAG  Gi  has 
exactly  one  source  task  r/,  which  has  only  outgoing  edges, 
and  one  sink  task  r”’,  which  has  only  incoming  edges. 
Multi-source/multi-sink  DAGs  can  be  supported  with  the 
addition  of  singular  “virtual”  sources  and  sinks  that  connect 
multiple  sources  and  sinks,  respectively.  Virtual  sources  and 
sinks  have  a  worst-case  execution  time  (WCET)  of  zero. 

We  consider  the  scheduling  of  DAGs  as  just  described 
on  a  heterogeneous  hardware  platform  consisting  of  differ¬ 
ent  types  of  CEs.  A  given  CE  might  be  a  CPU,  DSP,  or 
some  specialized  hardware  accelerator  (HAC).  The  CEs  are 
organized  in  M  CE  pools ,  where  each  CE  pool  tt/.  consists 
of  TOfc  identical  CEs.  Each  task  t?  has  a  parameter  Pj!  that 
denotes  the  particular  CE  pool  on  which  it  must  run,  i.e., 
py  =  7 Tfc  means  that  each  job  of  rj;  must  be  scheduled  on  a 
CE  in  the  CE  pool  tt/..  The  WCET  of  task  rj  is  denoted  Gj' . 

Although  the  problem  description  in  Sec.  1  indicated  that 
source  tasks  are  released  periodically,  we  generalize  this  to 
allow  sporadic  releases,  i.e.,  for  the  DAG  Gi,  the  job  re¬ 
leases  of  T-  have  a  minimum  separation  time,  denoted  Y). 
A  non-source  task  (v  >  1)  releases  its  jth  job  JjT  when 
the  j'h  jobs  of  all  its  producer  tasks  in  Gi  have  completed. 
That  is,  letting  a"  rj  and  /?  •  denote  the  release  (or  arrival) 
and  finish  times  of  JT  •,  respectively, 

a-L  =  rriax{/jjj-  |  rj"  is  a  producer  of  t-’}.  (1) 


Figure  1:  ADAGGi. 
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f  Job  Release  \  Job  Deadline  ""["job  Completion  |  |  CPU  Execution  DSP  Execution 

(Assume  depicted  jobs  are  scheduled  alongside  other  jobs,  which  are  not  shown.) 

Figure  2:  Example  schedule  for  the  DAG  in  Gi  in  Fig.  1. 

The  response  time  of  job  Jf  ■  is  defined  as  /f  ■  —  a]’  ;r  and 
the  end-to-end  response  time  of  the  DAG  G,;  as  /”j  —  a] . 

Example  2.  Fig.  2  depicts  an  example  schedule  for  the 
DAG  Gi  in  Fig.  1,  assuming  task  rf  is  required  to  execute 
on  a  DSP  and  the  other  tasks  are  required  to  execute  on  a 
CPU.  The  first  (resp.,  second)  job  of  each  task  has  a  lighter 
(resp.,  darker)  shading  to  make  them  easier  to  distinguish. 
Tasks  rj2  and  rf  have  only  one  producer,  t]  ,  so  when  task 
rf  finishes  a  job  at  times  3  and  8,  rf  and  rf  release  a  job  im¬ 
mediately.  In  contrast,  task  rf  has  two  producers,  rf  and  rf . 
Therefore,  rf  cannot  release  a  job  at  time  5  when  rf  finishes 
a  job,  but  rather  must  wait  until  time  6  when  rf  also  finishes 
a  job.  Note  that  consecutive  jobs  of  the  same  task  might  exe¬ 
cute  in  parallel  (e.g.,  Jf  ,  and  Jf  2  execute  in  parallel  during 
[11, 12)).  Furthermore,  for  a  given  task,  a  later-released  job 
(e.g.,  Jf  2)  may  even  finish  earlier  than  an  earlier-released 
one  (e.g.,  Jff)  due  to  execution-time  variations. 


We  also  use  F^  to  denote  the  set  of  tasks  that  are  required  to 
execute  on  the  CE  pool  tt/,,  i.e., 

n  =  { t f  |  py  =  nk}.  (4) 

The  overutilization  of  a  CE  pool  could  cause  unbounded 
response  times,  so  we  require  for  each  k, 

^2  ui  -  TOfc"  (5) 

tV  GTfc 

3  Offset-Based  Independent  Tasks 

In  this  section,  we  present  a  second  task  model,  which  as 
shown  in  Sec.  4  can  be  viewed  as  a  refinement  of  that  just 
presented.  The  prior  model  is  somewhat  problematic  be¬ 
cause  of  difficult-to-analyze  dependencies  among  jobs.  In 
particular,  by  (1),  the  release  times  of  jobs  of  non-source 
tasks  depend  on  the  finish  times  of  other  jobs,  and  hence  on 
their  execution  times.  By  (2),  deadlines  (and  hence  priori¬ 
ties)  of  jobs  are  affected  by  similar  dependencies. 

In  order  to  ease  analysis  difficulties  associated  with  such 
job  dependencies,  we  introduce  here  the  offset-based  inde¬ 
pendent  task  (obi-task)  model.  Under  this  model,  tasks  are 
partitioned  into  groups.  The  z*  such  group  consists  of  tasks 
denoted  rf .  rf, . ...  rf* ,  where  rf  is  a  designated  source 
task  that  releases  jobs  sporadically  with  a  minimum  sepa¬ 
ration  of  1).  That  is,  for  any  positive  integer  j, 

al,j+i  -  ajj  >  Tf  (6) 

Job  releases  of  each  non-source  task  rf  are  governed  by  a 
new  parameter  <I>{ ,  called  the  offset  of  rf .  Specifically,  rf 
releases  its  jth  job  exactly  time  units  after  the  release 
time  of  the  jth  job  of  the  source  task  rf  of  its  group.  That  is, 

aitj  =  ajj  +  (7) 

For  consistency,  we  define 

$!  =  o.  (8) 


Scheduling.  Since  many  CEs  are  non-preemptible,  we  use 
the  non-preemptive  global  EDF  (G-EDF)  scheduling  algo¬ 
rithm  within  each  CE  pool.  The  deadline  of  job  Jf  •  is  given 
by 

dh  =  avitj  +  DV,  (2) 

where  D\  is  the  relative  deadline  of  task  rf .  For  example,  in 
the  example  schedule  in  Fig.  2,  relative  deadlines  of  D\  = 
7,  D'l  =  4 ,  Df  =  6,  and  Df  =  8  are  assumed. 

In  the  context  of  this  paper,  deadlines  mainly  serve  the 
purpose  of  determining  jobs’  priorities,  rather  than  strict 
timing  constraints  for  individual  jobs.  Therefore,  deadline 
misses  are  acceptable  as  long  as  the  end-to-end  response 
time  of  each  DAG  can  be  reasonably  bounded. 

Utilization.  We  denote  the  utilization  of  task  rf  by 


Under  the  obi-task  model,  a  job  of  a  task  rf  can  be  sched¬ 
uled  at  any  time  after  its  release  independently  of  the  execu¬ 
tion  of  any  other  jobs,  even  jobs  of  the  same  task  rf . 

The  definitions  so  far  have  dealt  with  job  releases.  Addi¬ 
tionally,  the  two  per-task  parameters  Gf  and  Pf  from  Sec.  2 
are  retained  with  the  same  definitions. 

The  following  property  shows  that  every  obi-task  rf  has 
a  minimum  job-release  separation  of  Tf 

Property  1.  For  any  obi-task  Ti  ’  ai,j+ 1  ai,j  —  7f. 

Proof. 

ai,j+i  -  ah  =  {by  (7)} 

ia\,j+ 1  +  )  ~  ( al,j  +  ) 

>  {by  (6)} 

Ti 


□ 
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4  Response-Time  Bounds 

In  this  section,  we  establish  two  results  that  enable  prior 
work  to  be  leveraged  to  establish  response-time  bounds  for 
DAG-based  task  systems.  First,  we  show  that,  under  the  obi- 
task  model  with  arbitrary  offset  settings,  per-task  response¬ 
time  bounds  can  be  derived  by  exploiting  prior  work  per¬ 
taining  to  a  task  model  called  the  npc-sporadic  task  model 
(“npc”  stands  for  “no  precedence  constraints” — this  refers 
to  the  lack  of  precedence  constraints  among  jobs  of  the  same 
task)  [12,  25],  Second,  we  show  that,  by  properly  setting 
offsets,  any  DAG-based  task  system  can  be  transformed  to 
a  corresponding  obi-task  system. 


instantiation  of  the  npc-sporadic  task  t?  =  {C? ,  Tj,  D?). 
Hence,  any  instantiation  of  an  obi-task  set  {r”  |  P]'  =  trk} 
is  an  instantiation  of  the  npc-sporadic  task  set  {t[  \  P f  = 
7Tfc}.  Also,  since  obi-tasks  execute  independently  of  one  an¬ 
other,  obi-tasks  executing  in  different  CE  pools  cannot  af¬ 
fect  each  other.  Since  each  CE  pool  tt;c  has  mk  identical  pro¬ 
cessors,  the  problem  we  must  consider  is  that  of  scheduling 
an  instantiation  of  the  npc-sporadic  task  set  {t"  |  P?  =  7 t*,} 
on  mk  identical  processors.  Since  Theorem  1  applies  to  a 
non-concrete  npc-sporadic  task  set,  it  applies  to  every  con¬ 
crete  instantiation  of  such  a  task  set.  Thus,  we  have  the  fol¬ 
lowing  response-time  bound  for  each  obi-task  t%\ 


4.1  Response-Time  Bounds  for  Obi-Tasks 

An  npc-sporadic  task  Ti  is  specified  by  (Cj,  Tt.  Di),  where 
C,  is  its  WCET,  7)  is  the  minimum  separation  time  between 
consecutive  job  releases  of  r,;,  and  D,  is  its  relative  deadline. 
As  before,  Tj’s  utilization  is  u-,  =  C\/Tl. 

The  main  difference  between  the  conventional  sporadic 
task  model  and  the  npc-sporadic  task  model  is  that  the  for¬ 
mer  requires  successive  jobs  of  each  task  to  execute  in  se¬ 
quence  while  the  latter  allows  them  to  execute  in  paral¬ 
lel.  That  is,  under  the  conventional  sporadic  task  model, 
job  Ji  j+ 1  cannot  commence  execution  until  its  predecessor 
Jij  completes,  even  if  oqj+i,  the  release  time  of  Jij+i, 
has  elapsed.  In  contrast,  under  the  npc-sporadic  task  model, 
any  job  can  execute  as  soon  as  it  is  released.  Note  that,  al¬ 
though  we  allow  intra-task  parallelism,  each  individual  job 
still  must  execute  sequentially. 

Yang  and  Anderson  [25]  investigated  the  G-EDF 
scheduling  of  npc-sporadic  tasks  on  uniform  heterogeneous 
multiprocessor  platforms  where  different  processors  may 
have  different  speeds.  By  setting  each  processor’s  speed  to 
be  1.0,  the  following  theorem  follows  from  their  work. 

Theorem  1.  (Follows  from  Theorem  4  in  [25])  Consider 
the  scheduling  of  a  set  of  npc-sporadic  tasks  r  on  m  iden¬ 
tical  multiprocessors.  Under  non-preemptive  G-EDF,  each 
npc-taskTi  £  t  has  the  following  response-time  bound,  pro¬ 
vided  J2Tl£T  Ul  -  m- 


1 

m 


Di 


Eu*  +  E 


tiGt 


tiGt 


ui  •  max{0,  Ti 

tiGt 


+  max{C;}  + 

tiGt 
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We  now  show  that  Theorem  1  can  be  applied  to  obtain 
per-task  response-time  bounds  for  any  obi-task  set. 
Concrete  vs.  non-concrete.  A  concrete  sequence  of  job  re¬ 
leases  that  satisfies  a  task’s  specification  (under  either  the 
obi-  or  npc-sporadic  task  model)  is  called  an  instantiation  of 
that  task.  An  instantiation  of  a  task  set  is  defined  similarly. 
In  contrast,  a  task  or  a  task  set  that  can  have  multiple  (poten¬ 
tially  infinite)  instantiations  satisfying  its  specification  (e.g., 
minimum  release  separation)  is  called  non-concrete. 

By  Property  1,  any  instantiation  of  an  obi-task  is  an 


Rvi  =-H  Dvi-  E  u?+  E  W-ma x{0,7]-Dr}) 

TOfc\  rpGrk  r»ert  ) 

+  max  {CH  +  ’ -C?,  (9) 

rfe  r/  mk 

where  Tfc  =  {rf  |  Pp  =  7 rfc}. 

Note  that  (9)  is  applicable  as  long  as  all  relative  deadlines 
are  non-negative,  and  applies  to  any  arbitrary  offset  setting. 

4.2  From  DAG-Based  Task  Sets  to  Obi-Task  Sets 

We  now  show  that,  by  properly  setting  offsets,  any  DAG- 
based  task  set  can  be  transformed  to  an  obi-task  set,  and 
per-DAG  end-to-end  response-time  bounds  can  be  derived 
by  leveraging  the  obi-task  response-time  bounds  just  stated. 

Any  DAG-based  task  set  can  be  implemented  by  an  obi- 
task  set  in  an  obvious  way:  each  DAG  becomes  an  obi-task 
group  with  the  same  task  designated  as  its  source,  and  all  Tt, 
G[ ,  and  P‘:  parameters  are  retained  without  modification. 
What  is  less  obvious  is  how  to  define  task  offsets  under  the 
obi-task  model.  This  is  done  by  setting  each  $7  (v  ^  1) 
parameter  to  be  a  constant  such  that 

«>■'  >  max  {^  +  R^},  (10) 

T-  dzprod(Ti  ) 

where  prod^T?)  denotes  the  set  of  obi-tasks  corresponding 
to  the  DAG-based  tasks  that  are  the  producers  of  the  DAG- 
based  task  in  G ,,  and  R1)  denotes  a  response-time  bound 
for  the  obi-task  r/; .  For  now,  we  assume  that  Ft1)  is  known, 
but  later,  we  will  show  how  to  compute  it. 

Example  3.  Consider  again  the  DAG  G \  in  Fig.  1.  Assume 
that,  after  applying  the  above  transformation,  the  obi-tasks 
have  response-time  bounds  of  R\  =  9 ,  R\  =  5,  R\  =  7, 
and  It]  =  8,  respectively.  Then,  we  can  set  <t>  j  :  0,  <l»  j  =  9, 
=  9,  and  =  16,  respectively,  and  satisfy  (10).  With 
these  response-time  bounds,  the  end-to-end  response-time 
bound  that  can  be  guaranteed  is  determined  by  i?j,  R\,  and 
R \  and  is  given  by  Ri  =  24.  Fig.  3  depicts  a  possible  sched¬ 
ule  for  these  obi-tasks  and  illustrates  the  transformation. 
Like  in  Fig.  2,  the  first  (resp.,  second)  job  of  each  task  has  a 
lighter  (resp.,  darker)  shading,  and  intra-task  parallelism  is 
possible  (e.g.,  Jf  x  and  jf2  in  time  interval  [23,  24)). 

The  following  properties  follow  from  this  transformation 
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(Assume  depicted  jobs  are  scheduled  alongside  other  jobs,  which  are  not  shown.) 

Figure  3:  Example  schedule  of  the  obi-tasks  corresponding  to  the 
DAG-based  tasks  in  Gi  in  Fig.  1. 

process.  According  to  Property  2,  a  DAG-based  task  set  can 
be  implemented  by  a  corresponding  set  of  obi-tasks,  and  all 
producer/consumer  constraints  in  the  DAG-based  specifica¬ 
tion  will  be  implicitly  guaranteed,  provided  the  offsets  of 
the  obi-tasks  are  properly  set  (i.e.,  satisfy  (10)). 

Property  2.  If  rf  is  a  producer  of  t"  in  the  DAG-based 
task  system,  then  for  the  j,h  jobs  of  the  corresponding  two 
obi-tasks,  fjjj  <  a -A . 

Proof.  By  (7),  =  a]  3  +  ,  and  by  the  definition  of  Rf, 

fti  <  "l  +  Rt  Thus,’ 

/&<  a},,- +$?  +  £?.  (11) 

By  (7),  aVj  =  aR  +  $V,  and  by  (10),  ^  +  Rhj.  Thus, 

ai,j  >  al,j  +  +  Ri  •  (12) 

By  (11)  and  (12),  fa  <  aJG.  □ 

Property  3  shows  how  to  compute  an  end-to-end 
response-time  bound  Ri. 

Property  3.  In  the  obi-task  system,  for  each  j,  all  jobs 
J}  ■ ,  J,2,,  •••  ,  finish  their  execution  within  R,  time 
units  after  a],  where 

Ri  =  ^i+R7li.  (13) 

Proof.  By  (7)  and  the  definition  of  Rj ,  .Jfl  finishes  by  time 
ai,j  +  <T>1  +  Rf  •  Thus,  in  particular  finishes  within  <!»"’  + 
Rfl  =  Ri  time  units  after  a] .  Also,  by  (10),  $  ■’  +  Rf  < 
<t>”\  since  t"*  is  the  single  sink  in  Gi.  Because  <  Ri, 
this  implies  that,  for  any  v,  Jfj  finishes  within  Ri  time  units 
after  al.  □ 


Thus,  a  DAG-based  task  set  can  be  transformed  to  an 
obi-task  set  with  the  same  per- task  parameters.  Given  these 
per-task  parameters,  a  response-time  bound  for  each  obi- 
task  can  be  computed  by  (9)  for  any  arbitrary  offset  setting. 
Then,  we  can  properly  set  the  offsets  for  each  obi-task  ac¬ 
cording  to  (10)  by  considering  the  corresponding  tasks  in 
each  DAG  in  topological  order,  starting  with  <l>]  =  0  for 
each  source  task  t},  by  (8).  By  Property  2,  the  resulting  obi- 
task  set  satisfies  all  requirements  of  the  original  DAG-based 
task  system,  and  by  Property  3,  an  end-to-end  response-time 
bound  Ri  can  be  computed  for  each  DAG  Gi. 

(Note  that  the  response-time  bound  for  a  virtual 
source/sink  is  not  computed  by  (9),  but  is  zero  by  definition, 
since  its  WCET  is  zero.  Any  job  of  such  a  task  completes  in 
zero  time  as  soon  as  it  is  released.) 

5  Setting  Relative  Deadlines 

In  the  prior  sections,  we  showed  that,  by  applying  our  pro¬ 
posed  transformation  techniques,  an  end-to-end  response¬ 
time  bound  for  each  DAG  can  be  established,  given  ar¬ 
bitrary  but  fixed  relative-deadline  settings.  That  is,  given 
l)  j  >  0  for  any  i,  v,  we  can  compute  correspond¬ 

ing  end-to-end  response-time  bounds  (i.e.,  Ri  for  each  i) 
by  (9),  (8),  (10),  and  (13). 

Similar  DAG  transformation  approaches  have  been  pre¬ 
sented  previously  [11,  18],  but  under  the  assumption  that 
intra-task  precedence  constraints  exist  (i.e.,  jobs  of  the  same 
task  must  execute  in  sequence).  Moreover,  in  this  prior 
work,  per-task  relative  deadlines  have  been  defined  in  a 
DAG-oblivious  way.  By  considering  the  actual  structure  of 
such  a  DAG,  it  may  be  possible  to  reduce  its  end-to-end 
response-time  bound  by  setting  its  tasks’  relative  deadlines 
so  as  to  favor  certain  critical  paths. 

Consider,  for  example,  the  DAG 
illustrated  in  Fig.  4.  Suppose  that 
the  prior  analysis  yields  a  response¬ 
time  bound  of  10  for  each  task, 
as  depicted  within  each  node.  The 
corresponding  end-to-end  response¬ 
time  bound  would  then  be  40  and 
is  obtained  by  considering  the  right- 
side  path.  Now,  suppose  that  we  al¬ 
ter  the  tasks’  relative-deadline  set¬ 
tings  to  favor  the  tasks  along  this 
path  at  the  possible  expense  of  the 
remaining  task  on  the  left.  Further, 
suppose  this  modification  changes 
the  per-task  response-time  bounds 
to  be  as  depicted  in  parentheses. 

Then,  this  modification  would  have 
the  impact  of  reducing  the  end-to- 
end  bound  to  32. 

In  this  section,  we  show  that  the  problem  of  determining 
the  “best”  relative-deadline  settings  can  be  cast  as  a  linear- 
programming  problem,  which  can  be  solved  in  polynomial 
time.  The  proposed  linear  program  (LP)  is  developed  in  the 
next  two  subsections. 


Figure  4:  More 
highly  prioritizing 
the  right-side  path 
in  this  DAG  de¬ 
creases  its  end-to- 
end  response-time 
bound. 
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5.1  Linear  Program 

In  our  LP,  there  are  three  variables  per  task  t-  ':  D" ,  and 

Rf.  The  parameters  T),  Cf,  and  mk  are  viewed  as  constants. 
Thus,  there  are  3|  V |  variables  in  total,  where  \V\  is  the  total 
number  of  tasks  (i.e.,  nodes)  across  all  DAGs  in  the  system. 
Before  stating  the  required  constraints,  we  first  establish  the 
following  theorem,  which  shows  that  a  relative-deadline  set¬ 
ting  of  Dy  >  Tx  is  pointless  to  consider. 

Theorem  2.  If  D-j.  >  Tx,  then  by  setting  l)vr  =  Tx,  Rvx, 
the  response-time  bound  of  task  rf,  will  decrease,  and  each 
other  task’s  response-time  bound  will  remain  the  same. 

Proof.  To  begin,  note  that,  by  (9),  rjf  does  not  impact 
the  response-time  bounds  of  those  tasks  executing  on  CE 
pools  other  than  Pf.  Therefore,  the  response-time  bounds 
{Rf  |  Pf  7^  Pf}  for  such  tasks  are  not  altered  by  any 
change  to  Dy. 

In  the  remainder  of  the  proof,  we  consider  a  task  r"  such 
that  Pf  =  Pf.  Let  Rf  and  Iff  denote  the  response-time 
bounds  for  rf  before  and  after,  respectively,  reducing  l)yr  to 
Tx.  If  i  =  x  and  v  =  y,  then  by  (9), 


ny  _  m 
R\  -  If;1  r - - 


mk 


£  »r 


—  (max{0,Tx  -  Dv}  -  max{0,Tx  -  Tx}) 
mk 

> {since  Dvx  >  Tx} 

0. 


Alternatively,  if  i  x  or  v  y,  then  by  (9), 

vy 

Rf  -  R'f  =-*( max{0,  Tx  -  Dv}  -  max{0,  Tx  -  Tx\) 
mk 

={ since  Df.  >  Tx} 

0. 

Thus,  the  theorem  follows.  □ 

By  Theorem  2,  the  reduction  of  Dy  mentioned  in  the 
theorem  does  not  increase  the  response-time  bound  for  any 
task.  By  (10),  this  implies  that  none  of  the  offsets,  {${}, 
needs  to  be  increased.  Therefore,  by  Property  3,  no  end-to- 
end  response-time  bound  increases.  These  properties  moti¬ 
vate  our  first  set  of  linear  constraints. 

Constraint  Set  (i):  For  each  task  rf, 

0  <  DV  <  Ti. 

2 1 V  individual  linear  inequalities  arise  from  this  constraint 
set,  where  |  Vj  is  the  total  number  of  tasks. 

Another  issue  we  must  address  is  that  of  ensuring  that 
the  offset  settings,  given  by  (10),  are  encoded  in  our  LP. 
This  gives  rise  to  the  next  constraint  set. 

Constraint  Set  (ii):  For  each  edge  from  rf  to  rf  in  a  DAG 

Git 

<!>{  >  +  Rf. 


There  are  \E\  distinct  constraints  in  this  set,  where  \E\  is  the 
total  number  of  edges  in  all  DAGs  in  this  system. 

Finally,  we  have  a  set  of  constraints  that  are  linear  equal¬ 
ity  constraints. 

Constraint  Set  (iii):  With  Constraints  Set  (i),  it  is  clear  that 
we  can  re-write  (9)  as  follows,  for  each  task  rf, 

Ri  ~  ( °i  ■  E  <+  E  « ■  (T‘  DT))  I 

k  \  r»erfc  r“Grfc  J 

+  max  {Cn  +  (14) 

Tflk 

where  F^  =  {rf  | Pf  =  Trk}.  Moreover,  by  (8),  for  each 
DAG  Gi, 

=  o. 

Constraint  Set  (iii)  yields  \V\  +  |G|  linear  equations, 
where  G  denotes  the  number  of  DAGs. 

Constraint  Sets  (i),  (ii),  and  (ii)  fully  specify  our  LP,  with 
the  exception  of  the  objective  function.  In  this  LP,  there  are 
3|V|  variables,  2\V\  +  \E\  inequality  constraints,  and  \V\  + 
|G|  linear  equality  constraints. 

5.2  Objective  Function 

Different  objective  functions  can  be  specified  for  our  LP 
that  optimize  end-to-end  response-time  bounds  in  different 
senses.  Here,  we  consider  a  few  examples. 

Single-DAG  systems.  For  systems  where  only  a  single 
DAG  exists,  the  optimization  criterion  is  rather  clear.  In 
order  to  optimize  the  end-to-end  response-time  bound  of 
the  single  DAG,  the  objective  function  should  minimize  the 
end-to-end  response-time  bound  of  the  only  DAG,  G i.  That 
is,  the  desired  LP  is  as  follows. 

minimize  T1"1  +  Rf1 

subject  to  Constraint  Sets  (i),  (ii),  and  (iii) 

Multiple-DAG  systems.  For  systems  containing  multiple 
DAGs,  choices  exist  as  to  the  optimization  criteria  to  con¬ 
sider.  We  list  two  here. 

Minimizing  the  average  end-to-end  response-time 
bound: 

minimize  ^  ($"*  +  Rf{ ) 

i 

subject  to  Constraint  Sets  (i),  (ii),  and  (iii) 

Minimizing  the  maximum  end-to-end  response¬ 
time  bound: 

minimize  Y 

subject  to  Vi  :  4>™'  +  <  Y 

Constraint  Sets  (i),  (ii),  and  (iii) 
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Figure  5:  Illustration  of  DAG  combining. 


if  all  of  its  producers  have  already  finished.  For  example, 
in  Fig.  3,  ,/ji  x  cannot  execute  until  time  9,  even  though  its 
corresponding  producer  job,  J\  , ,  has  finished  by  time  3. 

Observed  response  times  can  be  improved  under 
deadline-based  scheduling  without  altering  analytical 
response-time  bounds  by  using  a  technique  called  early  re¬ 
leasing  [10].  When  early  releasing  is  allowed,  a  job  is  eligi¬ 
ble  for  execution  as  soon  as  all  of  its  corresponding  producer 
jobs  have  finished,  even  if  this  condition  is  satisfied  before 
its  actual  release  time.  Early  releasing  does  not  impact  ana¬ 
lytical  response-time  bounds.  In  an  appendix,  why  provide 
a  brief  explanation  as  to  why  this  is  the  case  and  consider 
an  example  schedule  where  early  releasing  is  allowed. 


6  DAG  Combining 

In  the  application  domain  that  motivates  our  work,  the 
DAGs  to  be  scheduled  are  typically  of  quite  low  utiliza¬ 
tions  and  are  defined  based  on  a  relatively  small  number  of 
templates  that  define  various  computational  patterns.  Two 
DAGs  defined  using  the  same  template  are  structurally  iden¬ 
tical:  they  are  defined  by  graphs  that  are  isomorphic,  corre¬ 
sponding  nodes  from  the  two  graphs  perform  identical  com¬ 
putations,  the  source  nodes  are  released  at  the  same  time, 
etc.  Such  structurally  identical  graphs  can  be  combined  into 
one  graph  with  a  reduced  period  and  larger  utilization,  as 
long  any  overutilization  of  the  underlying  hardware  plat¬ 
form  is  avoided.  Such  combining  can  be  a  very  effective 
technique,  because  as  the  experiments  presented  later  show, 
our  response-time  bounds  tend  to  be  proportional  to  periods. 

We  illustrate  this  idea  with  a  simple  example.  Consider 
two  DAGs  G i  and  G2  with  a  common  period  of  T  that  are 
structurally  identical.  A  schedule  of  these  two  DAGs  is  il¬ 
lustrated  abstractly  at  the  top  of  Fig.  5.  As  illustrated  at  the 
bottom  of  the  figure,  if  these  two  DAGs  are  combined,  then 
they  are  replaced  by  a  structurally  identical  graph,  denoted 
here  as  Gn  2],  with  a  period  of  Tj 2.  With  this  change,  the 
provided  response-time  bounds  have  to  be  slightly  adjusted. 
For  example,  if  Gh2]  has  a  response-time  bound  of  llyiy]. 
then  this  would  also  be  a  response-time  bound  for  G 1,  but 
that  for  G2  would  be  f?[i,2]  +  f-,  because  in  combining  the 
two  graphs,  the  releases  of  G2  are  effectively  shifted  for¬ 
ward  by  -y  time  units.  While  this  graph  combining  idea  is 
really  quite  simple,  the  experiments  presented  later  suggest 
that  it  can  have  a  profound  impact  in  the  considered  appli¬ 
cation  domain.  In  particular,  in  that  domain,  per-DAG  uti¬ 
lizations  are  low  enough  that  upwards  of  40  DAGs  can  be 
combined  into  one.  Thus,  the  actual  period  reduction  is  not 
merely  by  a  factor  of  1  but  by  a  factor  as  high  as  ^ . 

7  Early  Releasing 

Transforming  a  DAG-based  task  system  to  a  correspond¬ 
ing  obi-task  system  enabled  us  to  derive  an  end-to-end 
response-time  bound  for  each  DAG.  However,  such  a  trans¬ 
formation  may  actually  cause  observed  end-to-end  response 
times  at  runtime  to  increase,  because  the  offsets  introduced 
in  the  transformation  may  prevent  a  job  from  executing  even 


8  Case  Study 


To  illustrate  the  computational  details  of  our  analysis,  we 
consider  here  a  case-study  system  consisting  of  three  DAGs, 
G 1,  G 2,  and  G 3,  which  are  specified  in  Fig.  6.  7Ti  is  a  CE 
pool  consisting  of  two  identical  CPUs,  and  7r2  is  a  CE  pool 
consisting  of  two  identical  DSPs.  Thus,  m±  =  m2  =  2. 
These  three  DAGs  have  fewer  nodes  and  higher  utilizations 
than  typically  found  in  our  considered  application  domain. 
However,  one  can  imagine  that  these  graphs  were  obtained 
from  combining  many  identical  graphs  of  lower  utilization. 
While  it  would  have  been  desirable  to  consider  larger  graphs 
with  more  nodes,  graphs  from  our  chosen  domain  typically 
have  tens  of  nodes,  and  this  makes  them  rather  unwieldy 
to  discuss.  Still,  the  general  conclusions  we  draw  here  are 
applicable  to  larger  graphs. 


Utilization  check.  First,  we  must  calculate  the  total  uti¬ 
lization  of  all  tasks  assigned  to  each  CE  pool  to  make 
sure  that  neither  is  overutilized.  We  have  ]T]r„gr  ui  = 

200+500+3°°  +  133+78+10907+73+5  =  1.686  <  2,  and 

V  nv  380  1  16+83+242  i  ini  ^9 

r2  ui  500  +  moo  i.iui  <4  z. 


Virtual  source/sink.  Note  that  all  DAGs  have  a  single 
source  and  sink,  except  for  G 2,  which  has  two  sinks.  For  it, 
we  connect  its  two  sinks  to  a  single  virtual  sink  rf ,  which 
has  a  WCET  of  0  and  a  response-time  bound  of  0.  We  call 
the  resulting  DAG  G'2.  For  convenience,  we  include  a  de¬ 
piction  of  this  graph  in  Fig.  10  in  an  appendix. 


Implicit  deadlines.  We  now  show  how  to  compute 
response-time  bounds  assuming  implicit  deadines,  i.e., 
D\  =  500  for  1  <  v  <  4,  =  1000  for  1  <  v  <  5 

(the  relative  deadline  of  the  virtual  sink  is  irrelevant),  and 
Dl  =  1000  for  1  <  v  <  3.  In  order  to  derive  an  end-to-end 
response-time  bound,  we  first  transform  the  original  DAG- 
based  tasks  into  obi-tasks  as  described  in  Sec.  3.  Next,  we 
calculate  a  response-time  bound  /?]'  for  each  obi-task  r" 
by  (9).  The  resulting  task  response-time  bounds,  {i?"},  are 
listed  in  Table  2,  which  is  given  in  the  appendix.  Note  that, 
as  a  virtual  sink,  the  response-time  bound  for  the  virtual  sink 
r|  does  not  need  to  be  computed  by  (9),  but  is  0  by  defini¬ 
tion.  By  (8)  and  (10),  the  offsets  of  the  obi-tasks  can  now  be 
computed  in  topological  order  with  respect  to  each  DAG. 
The  resulting  offsets,  {<!>]'},  are  also  shown  in  Table  2. 
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Figure  6:  DAGs  in  the  case-study  system.  G2  has  two  sinks,  so  to  analyze  it,  a  virtual  sink  must  be  added  that  has  a  WCET  of  0  and  a 
response-time  bound  of  0.  We  show  the  resulting  graph  in  Fig.  10  in  an  appendix. 


Finally,  by  Property  3,  we  have  an  end-to-end  response¬ 
time  bound  for  each  DAG:  i?i  =  4>f  +  Rf  =  2538.25, 
R2  =  $6  +  Re  =  43 615j  and  R3  =  $3  +  R3  =  3376.5. 

LP-based  deadline  settings.  If  we  use  LP  techniques  to 
optimize  end-to-end  response-time  bounds,  then  choices  ex¬ 
ist  regarding  the  objective  function,  because  our  system  has 
multiple  DAGs.  We  consider  three  choices  here. 

Minimizing  the  average  end-to-end  response-time 
bound.  For  this  choice,  relative-deadline  settings,  obi-task 
response-time  bounds,  and  obi-task  offsets  are  as  shown  in 
Table  3(a),  found  in  the  appendix.  The  resulting  end-to-end 
response-time  bounds  are  Ri  =  4>f  +  Rf  =  3134.5, 
R2  =  $6  +  Rl  =  2341.2,  and  R3  =  4>|  +  R%  =  1736.2. 

Minimizing  the  maximum  end-to-end  response-time 
bound.  For  this  choice,  relative-deadline  settings,  obi-task 
response-time  bounds,  and  obi-task  offsets  are  as  shown  in 
Table  3(b).  The  resulting  end-to-end  response-time  bounds 

are  R1=<S>\  +  R\  =  2650.4,  R2  =  +  R\  =  2650.4, 

and  f?3  =  $3  +  =  2650.4. 

Minimizing  the  maximum  proportional  end-to-end 
response-time  bound.  For  this  choice,  relative-deadline  set¬ 
tings,  obi-task  response-time  bounds,  and  obi-task  off¬ 
sets  are  as  shown  in  Table  3(c).  The  resulting  end-to-end 
response-time  bounds  are  Ri  =  4 =  2208.9, 
R2  =  +  Re  =  4417.8.  and  r3  =  $3  +  R3  =  4261.0. 

Early  releasing.  As  discussed  in  Sec.  7,  early  releasing 
can  improve  observed  response  times  without  compromis¬ 
ing  response-time  bounds.  The  value  of  allowing  early  re¬ 
leasing  can  be  seen  in  the  results  reported  in  Table  1 .  This 
table  gives  the  largest  observed  end-to-end  response  time  of 
each  DAG,  assuming  implicit  deadlines  with  and  without 
early  releasing,  in  a  schedule  that  was  simulated  for  50,000 
time  units.  Analytical  bounds  are  shown  as  well. 


Gi 

g2 

g3 

Early  releasing 

1006 

897 

453 

No  early  releasing 

1966.75 

3536.25 

2586.0 

Bounds 

2538.25 

4361.5 

3376.5 

Table  1 :  Observed  end-to-end  response  times  with/without  early 
releasing  and  analytical  end-to-end  response-time  bounds  for  the 
implicit-deadline  setting. 


9  Schedulability  Studies 

In  this  section,  we  expand  upon  the  specific  case  study  just 
described  by  considering  general  schedulability  trends  seen 
for  randomly  generated  task  systems. 

9.1  Improvements  Enabled  by  Basic  Techniques 

We  first  consider  the  improvements  enabled  by  the  ba¬ 
sic  techniques  covered  in  Secs.  4  and  5  that  underlie  our 
work:  allowing  intra-task  parallelism  as  provided  by  the 
npc-sporadic  task  model,  and  determining  relative-deadline 
settings  by  solving  an  LP. 

Random  system  generation.  We  considered  a  heteroge¬ 
neous  platform  comprised  of  three  CE  pools,  each  consist¬ 
ing  of  eight  identical  CEs.  Each  pool  was  assumed  to  have 
the  same  total  utilization.  We  considered  all  choices  of  total 
per-pool  utilizations  in  the  range  [1, 8]  in  increments  of  0.5. 

We  generated  DAG-based  task  systems  using  a  method 
similar  to  that  used  by  others  [3,  15].  These  systems  were 
generated  by  first  specifying  the  number  of  DAGs  in  the 
system,  N,  and  the  number  of  tasks  per  DAG,  n.  For 
each  considered  pair  N  and  n,  we  randomly  generated  50 
task-system  structures,  each  comprised  of  N  DAGs  with  n 
nodes.  Each  node  in  such  a  structure  was  randomly  assigned 
to  one  of  the  CE  pools,  and  for  each  DAG  in  the  structure, 
one  node  was  designated  as  its  source,  and  one  as  its  sink. 
Further,  each  pair  of  internal  nodes  (not  a  source  or  a  sink) 
was  connected  by  an  edge  with  probability  edgeProb, 
a  settable  arameter.  Such  an  edge  was  directed  from  the 
lower-indexed  node  to  the  higher-indexed  node,  to  preclude 
cycles.  Finally,  an  edge  was  added  from  the  source  to  each 
internal  node  with  no  incoming  edges,  and  to  the  sink  from 
each  internal  node  with  no  outgoing  edges. 

For  each  considered  per-pool  untilization  and  each  gen¬ 
erated  task-system  structure,  we  randomly  generated  50 
actual  task  systems  by  generating  task  utilizations  using 
the  MATLAB  function  randf  ixedsum  ( )  [23].  Accord¬ 
ing  to  the  application  domain  that  motivates  this  work,2 
we  defined  each  DAG’s  period  to  be  1  ms.  (A  task’s 
WCET  is  determined  by  its  utilization  and  period.)  For 


-In  applications  usually  considered  in  the  real-time-systems  commu- 
nity,  much  larger  periods  are  the  norm.  The  considered  domain  is  quite 
different. 
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Figure  7:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.5. 

each  considered  value  of  N,  n,  and  total  per-pool  utiliza¬ 
tion  (one  point  in  one  of  our  graphs),  we  considered  50  (task 
system  structures)  x  50  (utilizations)  =  2,  500  task  sets. 

Comparison  setup.  We  compared  three  strategies:  (i) 
transforming  to  a  conventional  sporadic  task  system  and 
using  implicit  relative  deadlines,  which  is  a  strategy  used 
in  prior  work  on  identical  platforms  [18];  (ii)  transform¬ 
ing  to  an  npc-sporadic  task  system  and  using  implicit  rel¬ 
ative  deadlines;  and  (iii)  transforming  to  an  npc-sporadic 
task  system  and  using  LP-based  relative  deadlines.  When 
applying  our  LP  techniques,  we  chose  the  objective  func¬ 
tion  that  minimizes  the  maximum  end-to-end  response-time 
bound.  Although  an  identical  platform  was  assumed  in  [18], 
the  techniques  from  that  paper  can  be  extended  to  heteroge¬ 
neous  platforms  in  a  similar  way  to  this  paper. 

Results.  In  all  cases  that  we  considered,  the  two  evaluated 
techniques  improved  end-to-end  response-time  bounds,  of¬ 
ten  significantly.  Due  to  space  constraints,  we  present  here 
only  the  case  where  N  =  5,  n  =  20,  and  edgeProb  = 
0.5.  For  each  generated  task  set,  we  recorded  the  maximum 
end-to-end  response-time  bound  among  its  five  DAGs.  For 
each  given  total  per-pool  utilization  point,  we  report  here  the 
average  of  the  maximum  end-to-end  response-time  bounds 
among  the  2,  500  task  sets  generated  for  that  point.  We  call 
this  metric  the  average  maximum  end-to-end  response-time 
bound  (AMERB).  Fig.  7  plots  AMERBs  as  a  function  of  to¬ 
tal  per-CE-pool  utilization.  As  seen,  the  application  of  both 
techniques  reduced  AMERBs  by  39.42%  to  81.65%. 

In  addition  to  this  plot,  we  also  considered  other  cases 
with  different  values  for  N,  n,  and  edgeProb.  These 
other  results  show  similar  trends  and  can  be  found  in  an 
online  appendix  (available  at  http://cs.unc.edu/ 
'anderson/papers  . html). 

9.2  Improvements  Enabled  by  DAG  Combining 

As  mentioned  in  Sec.  6,  in  the  application  domain  that  mo¬ 
tivates  our  work,  DAGs  are  usually  defined  using  several 
well-defined  computational  templates,  and  as  a  result,  many 


identical  DAGs  will  exist.  We  proposed  the  technique  of 
DAG  combining  in  Sec.  6  to  exploit  this  fact  to  further  re¬ 
duce  response-time  bounds.  We  now  discuss  schedulability 
experiments  that  we  conducted  to  evaluate  this  technique. 

Random  system  generation.  We  employed  a  process  of 
randomly  generating  systems  that  is  similar  to  that  dis¬ 
cussed  in  Sec.  9.1,  except  that,  instead  of  generating  task- 
system  structures  comprised  of  N  DAGs,  we  generated 
structures  comprised  of  N  templates.  Additionally,  we  in¬ 
troduced  a  new  parameter  I\  that  indicates  the  number  of 
identical  DAGs  per  template.  A  period  of  1  ms  was  still 
associated  with  each  DAG. 

Comparison  setup.  We  compared  two  strategies:  (i)  do  no 
combining,  and  compute  end-to-end  response-time  bounds 
assuming  N  ■  K  independent  DAGs;  (ii)  combine  identi¬ 
cal  DAGs,  and  compute  end-to-end  response-time  bounds 
assuming  N  DAGs,  making  adjustments  as  discussed  in 
Sec.  6  to  obtain  actual  response-time  bounds  for  the  DAGs 
that  were  combined.  Under  both  strategies,  the  general  tech¬ 
niques  evaluated  in  Sec.  9.1  were  applied. 

Results.  In  all  cases  that  we  considered,  the  DAG  combin¬ 
ing  technique  improved  end-to-end  response-time  bounds 
significantly.  Due  to  space  constraints,  we  present  here  only 
the  case  where  each  system  has  five  templates,  each  of 
which  has  20  nodes,  and  edgeProb  =  0.5.  Other  results 
can  be  found  online.  For  the  considered  case.  Fig.  8  plots 
AMERBs  as  a  function  of  total  per-pool  utilization,  when 
the  number  of  identical  DAGs  per  template  is  fixed  to  40 
(this  number  is  close  to  what  would  be  expected  in  the  appli¬ 
cation  domain  that  motivates  this  work).  Also,  Fig.  9  plots 
AMERBs  as  a  function  of  the  number  of  identical  DAGs 
per  template,  when  every  CE  pool  is  fully  utilized  (i.e., 
the  total  utilization  of  each  pool  is  eight).  In  this  case,  the 
AMERB  metric  was  calculated  over  all  task  sets  that  have 
the  same  number  of  identical  DAGs  per  template.  Note  that 
the  AMERBs  in  Fig.  8  are  much  lower  than  those  in  Fig.  7, 
even  before  applying  the  DAG  combining  technique.  This 
is  because  the  systems  considered  in  Fig.  8  have  far  more 
DAGs  than  those  in  Fig.  7.  As  a  result,  for  each  given  total 
per-pool  utilization,  the  systems  in  Fig.  8  have  much  lower 
per-DAG  and  per- task  utilizations. 

According  to  our  industry  partners,  in  the  considered  ap¬ 
plication  domain,  a  DAG’s  end-to-end  response-time  bound 
should  typically  be  at  most  2.35  ms.  As  observed  in  Fig.  8, 
in  the  absence  of  DAG  combining,  AMERBs  in  this  exper¬ 
iment  were  as  high  as  8.2  ms.  However,  the  introduction 
of  DAG  combining  enabled  a  drop  to  less  than  2.0  ms,  even 
when  the  platform  was  fully  utilized.  This  demonstrates  that 
DAG  combining — as  simple  as  it  may  seem — can  have  a 
powerful  impact  in  the  targeted  domain. 

10  Conclusion 

We  presented  task-transformation  techniques  to  provide 
end-to-end  response-time  bounds  for  DAG-based  tasks  im¬ 
plemented  on  heterogenous  multiprocessor  platforms  where 
intra-task  parallelism  is  allowed.  We  also  presented  an  LP- 
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Figure  8:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40. 


Figure  9:  AMERBs  as  a  function  of  the  number  of  identical  DAGs 
per  template  in  the  case  where  total  utilization  in  each  CE  pool  is 
fixed  to  eight. 

based  method  for  setting  relative  deadlines  and  a  DAG 
combining  technique  that  can  be  applied  to  improve  these 
bounds.  We  evaluated  the  efficacy  of  these  results  by  consid¬ 
ering  a  case-study  task  system  and  by  conducting  schedu- 
lability  studies.  To  our  knowledge,  this  paper  is  the  first 
to  present  end-to-end  response-time  analysis  for  the  con¬ 
sidered  context.  In  future  work,  we  intend  to  extend  these 
techniques  to  deal  with  synchronization  requirements  and 
to  incorporate  methods  for  limiting  interference  caused  by 
contention  for  shared  hardware  components  such  as  caches, 
memory  banks,  etc. 
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this  paper  to  our  attention  and  for  answering  many  ques¬ 
tions  regarding  cellular  base  stations. 

References 

[1]  P.  Axer,  S.  Quinton,  M.  Neukirchner,  R.  Ernst,  B.  Dobel,  and  H.  Har- 
tig.  Response-time  analysis  of  parallel  fork-join  workloads  with  real¬ 
time  constraints.  In  25th  ECRTS,  2013. 

[2]  R.  Bajaj  and  D.  Agrawal.  Scheduling  multiple  task  graphs  in  hetero¬ 


geneous  distributed  real-time  systems  by  exploiting  schedule  holes 
with  bin  packing  techniques.  IEEE  Transactions  on  Parallel  and  Dis¬ 
tributed  Systems,  15(2):  107-1 18,  2004. 

[3]  S.  Baruah.  Improved  multiprocessor  global  schedulability  analysis 
of  sporadic  DAG  task  systems.  In  26th  ECRTS,  2014. 

[4]  S.  Baruah.  The  federated  scheduling  of  constrained-deadline  spo¬ 
radic  DAG  task  systems.  In  18th  DATE,  2015. 

[5]  S.  Baruah.  Federated  scheduling  of  sporadic  DAG  task  systems.  In 
29th  IPDPS,  2015. 

[6]  S.  Baruah,  V.  Bonifaci,  A.  Marchetti-Spaccamela,  L.  Stougie,  and 
A.  Wiese.  A  generalized  parallel  task  model  for  recurrent  real-time 
processes.  In  33rd  RTSS,  2012. 

[7]  big. LITTLE  Processing,  http://www.arm.com/products/processors 
/technologies/biglittleprocessing. php. 

[8]  V.  Bonifaci,  A.  Marchetti-Spaccamela,  S.  Stiller,  and  A.  Wiese.  Fea¬ 
sibility  analysis  in  the  sporadic  DAG  task  model.  In  25th  ECRTS, 

2013. 

[9]  H.  S.  Chwa,  J.  Lee,  K.  M.  Phan,  A.  Easwaran,  and  I.  Shin.  Global  edf 
schedulability  analysis  for  synchronous  parallel  tasks  on  multicore 
platforms.  In  25th  ECRTS,  2013. 

[10]  U.  Devi.  Soft  Real-Time  Scheduling  on  Multiprocessors.  PhD  thesis, 
University  of  North  Carolina,  Chapel  Hill,  NC,  2006. 

[11]  G.  Elliott,  N.  Kim,  J.  Erickson,  C.  Liu,  and  J.  Anderson.  Minimizing 
response  times  of  automotive  dataflows  on  multicore.  In  20th  RTCSA, 

2014. 

[12]  J.  Erickson  and  J.  Anderson.  Response  time  bounds  for  G-EDF  with¬ 
out  intra-task  precedence  constraints.  In  15th  OPODIS,  2011. 

[13]  J.  C.  Fonseca,  V.  Nelis,  G.  Raravi,  and  L.  M.  Pinho.  A  multi-DAG 
model  for  real-time  parallel  applications  with  conditional  execution. 
In  30th  SAC,  2015. 

[14]  T.  Grandpierre,  C.  Lavarenne,  and  Y.  Sorel.  Optimized  rapid  proto¬ 
typing  for  real-time  embedded  heterogeneous  multiprocessors.  In  7th 
CODES,  1999. 

[15]  J.  Li,  K.  Agrawal,  C.  Lu,  and  C.  Gill.  Analysis  of  global  EDF  for 
parallel  tasks.  In  25th  ECRTS,  2013. 

[16]  J.  Li,  A.  Saifullah,  K.  Agrawal,  C.  Gill,  and  C.  Lu.  Analysis  of  fed¬ 
erated  and  global  scheduling  for  parallel  real-time  tasks.  In  26th 
ECRTS,  2014. 

[17]  J.  Li,  A.  Saifullah,  K.  Agrawal,  C.  Gill,  and  C.  Lu.  Capacity  augmen¬ 
tation  bound  of  federated  scheduling  for  parallel  DAG  tasks.  Techni¬ 
cal  report,  Washington  University  in  St  Louis,  2014. 

[18]  C.  Liu  and  J.  Anderson.  Supporting  soft  real-time  DAG-based  sys¬ 
tems  on  multiprocessors  with  no  utilization  loss.  In  31st  RTSS,  2010. 

[19]  C.  Maia,  M.  Bertogna,  L.  Nogueira,  and  L.  M.  Pinho.  Global  edf 
schedulability  analysis  for  synchronous  parallel  tasks  on  multicore 
platforms.  In  22nd  RTNS,  2014. 

[20]  A.  Melani,  M.  Bertogna,  V.  Bonifaci,  A.  Marchetti-Spaccamela,  and 
G.  C.  Buttazzo.  Response-time  analysis  of  conditional  DAG  tasks  in 
multiprocessor  systems.  In  27th  ECRTS,  2015. 

[21]  A.  Parri,  A.  Biondi,  and  M.  Marinoni.  Response  time  analysis  for  g- 
edf  and  g-dm  scheduling  of  sporadic  DAG-tasks  with  arbitrary  dead¬ 
line.  In  23rd  RTNS,  2015. 

[22]  A  Saifullah,  K.  Agrawal,  C.  Lu,  and  C.  Gill.  Multi-core  real-time 
scheduling  for  generalized  parallel  task  models.  In  32nd  RTSS,  2011. 

[23]  R.  Stafford.  Random  vectors  with  fixed  sum.  http://www. 
mathworks . com/matlabcentral/f ileexchange/ 

97 00- random- vectors- with- fixed- sum. 

[24]  G.  Stavrinides  and  H.  Karatza.  Scheduling  multiple  task  graphs  in 
heterogeneous  distributed  real-time  systems  by  exploiting  schedule 
holes  with  bin  packing  techniques.  Simulation  Modelling  Practice 
and  Theory,  19(l):540-552,  2011. 

[25]  K.  Yang  and  J.  Anderson.  Optimal  GEDF-based  schedulers  that  al¬ 
low  intra-task  parallelism  on  heterogeneous  multiprocessors.  In  ES- 
TIMedia,  2014. 


10 


Appendix 

In  this  appendix,  we  provide  additional  details  ommitted 
from  the  main  body  of  the  paper  due  to  space  constraints. 

A:  Additional  Case-Study  Details 

Additional  details  concerning  the  case  study  covered  in 
Sec.  8  are  provided  below. 

Virtual  sink.  G2  in  Fig.  6  has  two  sinks,  so  to  analyze  it, 
a  virtual  sink  r|  must  be  added  that  has  a  WCET  of  0  and 
a  response-time  bound  of  0.  The  resulting  graph,  which  we 
denote  as  G'2,  is  shown  in  Fig.  10. 


Figure  10:  G'2,  where  a  virtual  sink  is  created  for  G2. 

Relative  deadlines,  offsets,  and  response-time  bounds. 

Detailed  data  arising  in  applying  the  intermediate  steps  in 
deriving  the  end-to-end  response-time  bounds  in  the  case 
study  in  Sec.  8  is  provided  in  Tables.  2  and  3. 

B:  Additional  Details  Concerning  Early  Releasing 

As  noted  in  Sec.  7,  early  releasing  does  not  affect  the 
response-time  analysis  for  npc-sporadic  tasks  presented  pre¬ 
viously  [25]  because  this  analysis  is  based  on  the  total  de¬ 
mand  for  processing  time  due  to  jobs  with  deadlines  at  or 
before  a  particular  time  instant.  Early  releasing  does  not 
change  upper  bounds  on  such  demand,  because  every  job’s 
actual  release  time  and  hence  deadline  are  unaltered  by 
early  releasing.  Thus,  the  response-time  bounds  and  there¬ 
fore  the  end-to-end  response-time  bounds  previously  estab¬ 
lished  without  early  releasing  still  hold  with  early  releasing. 

Example  4.  Considering  Gi  in  Fig.  1  again.  Fig.  3  is  a  pos¬ 
sible  schedule,  without  early  releasing,  for  the  obi-tasks  that 
implement  G 1,  as  discussed  earlier.  When  we  allow  early  re¬ 
leasing,  we  do  not  change  any  release  times  or  deadlines,  but 
simply  allow  a  job  to  become  eligible  for  execution  before 
its  release  time  provided  its  producers  have  finished.  Fig.  1 1 
depicts  a  possible  schedule  where  early  releasing  is  allowed, 
assuming  the  same  releases  and  deadlines  as  in  Fig.  3.  Sev¬ 
eral  jobs  ( e.g .,  Jf  -[ ,  Jf  2,  J\2’  ^i,i’  and  ^12)  now  com¬ 
mence  execution  before  their  release  times.  As  a  result,  ob¬ 
served  end-to-end  response  times  are  reduced,  while  still  re¬ 


taining  all  response-time  bounds  (per-task  and  end-to-end). 


C:  Additional  Graphs 

In  Sec.  9,  we  presented  our  results  for  N  =  5,  n  =  20, 
and  edgeProb  =  0.5.  Now,  we  provide  more  graphs 
of  our  results  when  varying  those  parameters.  Figs.  12-23 
plot  AMERBs  as  a  function  of  total  per-CE-pool  utilization, 
when  applying  our  basic  techniques  to  arbitrary  DAG-based 
task  sets  as  considered  in  Sec.  9.1.  Figs.  24^-7  provide  ad¬ 
ditional  results  pertaining  to  DAG  combining  as  discussed 
in  Sec.  9.2.  In  particular.  Figs.  24-35  plot  AMERBs  as  a 
function  of  total  per-pool  utilization,  when  the  number  of 
identical  DAGs  per  template  is  fixed  to  40,  and  Figs.  36- 
47  plot  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template,  when  every  CE  pool  is  fully  utilized 
( i.e .,  the  total  utilization  of  each  pool  is  eight). 
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Table  2:  Case-study  task  response-time  bounds  and  obi-task  offsets  assuming  implicit  deadlines.  Bold  entries  denote  sinks. 
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Table  3:  Case-study  relative-deadline  settings,  obi-task  response-time  bounds,  and  obi-task  offsets  when  using  linear  programming  to  (a) 
minimize  average  end-to-end  response-time  bounds,  (b)  minimize  maximum  end-to-end  response-time  bounds,  and  (c)  minimize  maximum 
proportional  end-to-end  response-time  bounds.  Bold  entries  denote  sinks. 
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(Assume  depicted  jobs  are  scheduled  alongside  other  jobs,  which  are  not  shown.) 


Figure  11:  Example  schedule  of  the  obi-tasks  corresponding  to  the  DAG-based  tasks  in  G\  in  Fig.  1,  when  early  releasing  is  allowed. 
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Figure  12:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.2. 


Figure  13:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.5. 


Figure  14:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.8. 


Figure  15:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.2. 


Figure  16:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.5. 


Figure  17:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  five  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.8. 
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Figure  18:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.2. 


Figure  19:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.5. 


Figure  20:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  ten  tasks  per 
DAG,  and  edgeProb=0.8. 


Figure  21:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.2. 


Figure  22:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.5. 


Figure  23:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  each  task  set  has  ten  DAGs,  20  tasks  per 
DAG,  and  edgeProb=0.8. 
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Figure  24:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.2. 


Figure  25:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.5. 


Figure  26:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.8. 


Figure  27:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.2. 


Figure  28:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.5. 


Figure  29:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  five  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.8. 
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Figure  30:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.2. 


Figure  3 1 :  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.5. 


Figure  32:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  ten 
tasks,  and  edgeProb=0.8. 


Figure  33:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.2. 


Figure  34:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.5. 


Figure  35:  AMERBs  as  a  function  of  total  utilization  in  each  CE 
pool  in  the  case  where  the  number  of  identical  DAGs  per  template 
is  fixed  to  40,  each  task  set  has  ten  templates,  each  DAG  has  20 
tasks,  and  edgeProb=0.8. 
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Figure  36:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.2. 


Figure  37:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.5. 


Figure  38:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.8. 


Figure  39:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.2. 


Figure  40:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.5. 


Figure  41:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  five  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.8. 
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Figure  42:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.2. 


Figure  43:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.5. 


Figure  44:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  ten  tasks,  and  edgeProb=0.8. 


Figure  45:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.2. 


Figure  46:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.5. 


Figure  47:  AMERBs  as  a  function  of  the  number  of  identical 
DAGs  per  template  in  the  case  where  total  utilization  in  each  CE 
pool  is  fixed  to  eight,  each  task  set  has  ten  templates,  each  DAG 
has  20  tasks,  and  edgeProb=0.8. 
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